Tutorial for retrieving data from the Swiss Open Data portal

This tutorial was originally published on DataCareer.

In this Jupyter Notebook we will retrieve data from open data portal "opendata.swiss". The portal is based on the open source project CKAN. CKAN stands for Comprehensive Knowledge Archive Network. It provides an extensive API for the metadata of the open data catalogue. This means that the information about the datasets can be retrieved from CKAN, but the data itself will have to be downloaded from the servers of the contributors ("opendata.swiss" in this cases).

In this tutorial we will take a look at the population of Swizterland using Python 3. Let's start with importing some packages we will use for this exercise.

Like mentioned, the CKAN API functions as a catalog for datasets. We need to define the URL for "opendata.swiss".

Let's get a list of all the datasets (called packages in CKAN) listed by "opendata.swiss".

The titles of the datasets are in the key called result. Let's create a new variable called datasets and find out how many datasets there are available.

This is quite an extensive list. We can print the last 10 to the screen to get an idea of the titles:

For this exercise we will take one from this list, called "bruttoinlandprodukt". Other examples could be: 'bevolkerung', 'elektroautos', or 'bevolkerungsdaten-im-zeitvergleich'.

Now let's download the package/dataset information. We need to take a few steps:

Did you walk through the information above? You can uncomment the last line (with pretty print) to check out the package information. Is this indeed the dataset you are interested in? If yes, then you need to download the dataset. It is also important to know the format of the dataset, for next steps. This information is also listed in the package information above.

Notice that this particular dataset is hosted at GitHub. When downloading from GitHub, it is better to request the raw data. We need to rewrite the URL a little bit to get there.

Feel free to take a follow the URL's bove. It is good to take a sneak peak so you know what the data will look like. The dataset can come in different formats, so let's specify which ones we are willing to accept and load them into a Pandas DataFrame.

As you can see, we need to make a few adjustments before we can continue. It is best to clean up the dataset before you start doing your analysis.

That's it! Now let's visualise the data with Pandas built-in plot functionality, which is based on 'matplotlib'.